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Abstract 

We calculate with Monte Carlo the goodness of fit and the confidence level of 
the standard allowed regions for the neutrino oscillation parameters obtained 
from the fit of solar neutrino data. We show that the values of the goodness of 
fit and of the confidence level of the allowed regions are significantly smaller 
than the standard ones. Using Neyman's method, we also calculate exact 
allowed regions with correct frequentist coverage. We show that the standard 
allowed region around the global minimum of the least-squares function is 
a reasonable approximation of the exact one, whereas the size of the other 
regions is dramatically underestimated in the standard method. 
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I. INTRODUCTION 



The standard method to analyze solar neutrino data in terms of neutrino oscillations 
consists in performing a least-squares fit. However, for the reasons described in Section [H| 
the standard least-squares analysis of solar neutrino data is approximate from a statistical 
point of view. 

In this paper we present statistical methods based on Monte Carlo numerical calculations 
that allow to improve the implementation of the least-squares fit of solar neutrino data. In 
Section || we review the standard method and we discuss why its approximate assumptions 
could lead to significant inaccuracy in the results. In Section |TTT| we present a Monte Carlo 
method that allows to calculate the goodness of fit of solar neutrino data. In Section [TV 
we present a Monte Carlo method that allows to calculate the confidence level of the usual 
allowed regions in the space of the neutrino oscillation parameters. In Section [V] we present 
an implementation to solar neutrino analysis of the classical frequentist Neyman method 
that allows to calculate exact confidence regions with correct coverage. 

Since the purpose of this paper is to illustrate different methods for the statistical analysis 
of solar neutrino data, we consider for simplicity only the data relative to the total rates 
measured in the Homestake [|IJ and Super-Kamiokande [[| experiments, and the weighted 
average of the total rates measured in the two Gallium experiments GALLEX [[J and SAGE 
J|]. The values of these rates are given in Table I of Ref. |J. Updated results of the Super- 
Kamiokande experiment and first results of the new GNO experiment have been presented 
in the recent Neutrino 2000 conference ||. Since the numerical calculations presented here 
take a long time and were started before the Neutrino 2000 conference, we do not take 
into account the new data. A complete analysis including the new data and the Super- 
Kamiokande data relative to the electron energy spectrum and the zenith-angle distribution 
is under way and will be published elsewhere [0]. 

Neutrino oscillations^ depend on the mass-squared difference Am 2 = m\ — m\ and on 
the mixing angle that is restricted in the interval [0, vr/2]. Traditionally solar neutrino 
data have been analyzed in terms of the parameters Am 2 and sin 2 2$, that determine the 
probability of neutrino oscillations in vacuum. However, it has recently been shown that the 
parameter tan 2 $ is more convenient for finding the allowed regions in the interval 7r/4 < $ < 
7r/2 when matter effects are important JTD|rrT||Q Moreover, the parameter tan 2 ?? allows a 
better view of the regions at large mixing angles with respect to the usual parameter sin 2 2$. 
Hence, in the following we analyze the solar neutrino data in terms of the parameters Am 2 
and tan 2 

Our calculation of the theoretical event rates follows the standard method described in 



1 Here we consider the minimal two-neutrino model, although more complicated models are pos- 
sible (see |,||). 

2 For the same reason the parameter tan 2 $ has been employed in the framework of three-neutrino 
mixing [I^-14| and the parameter sin 2 d has been employed in the framework of four-neutrino 



mixing [15]. 
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several papers for matter-enhanced MSW |16]| transitions fI7|-|T9|| and vacuum oscillations 
|2(|l9]. We calculate the MSW survival probability of z/ e 's in the Sun using the standard 
analytic prescription [^1|,[]1|,|17],|9] and the level-crossing probability appropriate for an expo- 
nential density profile [f^J17|l . We calculate the regeneration in the Earth using a two-step 
model of the Earth density profile [Z3}-|57|, that is known to produce results that do not 
differ appreciably from those obtained with the correct density profile. We have used the 
tables of neutrino fluxes, solar density and radiochemical detector cross sections available 
in Bahcall's web page 28fl . For simplicity we have neglected the matter effects that slightly 



affect the vacuum oscillation solutions of the solar neutrino problem, as discussed in EI|30 



II. STANDARD STATISTICAL ANALYSIS 

The traditional way to find the values of the neutrino oscillation parameters Am 2 , tan 2 $ 
allowed by solar neutrino data is to perform a least-squares fit, often called "x 2 fit". In 

this method the estimates Am 2 , tan 2 $ of the parameters Am 2 , tan 2 $ are obtained by 
minimizing the least-squares function 

X 2 = ^ (rV* > _ {V -% j2 (R^ - J#*>) , (1) 

where V is the covariance matrix of experimental and theoretical uncertainties, i?j exp ' 1 is the 

event rate measured in the j th experiment and i?j thr ^ is the corresponding theoretical event 
rate, that depends on Am 2 and tan 2 d. 

The standard method for the calculation of the covariance matrix V is the one presented 
in Refs. pl , |32 |, in which the independent uncertainties cr 2 of the experimental rates R^ xp \ 



and the uncertainties of the theoretical rates i?j thr ' ) are added in quadrature. Here we use 
this method, with the only difference that we assume a complete correlation of the errors of 



the averaged cross sections for the fluxes in each experiment Since these correlations 
are not known, the choice of complete correlations is the safest approach. Hence, using the 
notation of Refs. \ the covariance matrix V is given by 



(2) 



where 



3 The indices j, Ji,j2 = 1 5 2,3 indicate the three solar neutrino experiments GALLEX+SAGE 
[HQ, Homestake |fj] and Super-Kamiokande ||], respectively. The indices i, ix, %i = 1, . . . , 8 denote 
the solar neutrino fluxes produced in the eight solar thermonuclear reactions pp, pep, Hep, Be, B, 
N, O, F, respectively. The index k = 1, . . . , 11 indicate the eleven input astrophysical parameters 
in the SSM (see Refs. MM). 
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i?r = ^ sm c t ] (3) 

is the event rate in the j th experiment due to the neutrino flux (pf SM produced in the i th 
thermonuclear reaction in the sun according to the SSM and is the corresponding 

energy-averaged cross section that depends on Am 2 and tan 2 The quantity A In = 

ACf^ /Cjj is the relative uncertainty of the energy-averaged cross section C^ hr \ that is 
taken to be approximately equal to the one calculated without neutrino oscillations. 

The quantities are the input astrophysical parameters in the SSM, whose relative 
uncertainties AlnX^ determine the correlated uncertainties of the neutrino fluxes (pf SM 
through the logarithmic derivatives 

Sln0f M 

y(thr) 



The values of AlnC^ , a^, AlnX^ are given in Ref. p2 



Notice that, since the theoretical rates Rf^ depend on Am 2 and tan 2 also the covari- 
ance matrix V depends on Am 2 and tan 2 i?. 

In the traditional method the minimum X^ in of (fj) provides the estimate of the neutrino 

oscillation parameters, usually called "best-fit values" , Am 2 and tan 2 The goodness of 
the fit is estimated by calculating the probability to observe a minimum of X 2 larger than 
the one actually observed assuming for X^ in a x 2 distribution with N exp — N paI = 1 degrees 
of freedom, where N exp = 3 is the number of experimental data points (the sums over j\ and 
j 2 in Eq. (P are from 1 to N exp ) and iV par = 2 is the number of fitted parameters. Calling 
a this probability, one says that the fit is acceptable at 100a% CL. If a is larger than a 
minimum acceptable value, usually ~ 10~ 2 , the fit is considered to be acceptable and one 
can proceed further to determine the uncertainties in the determination of the parameters 
Am 2 and tan 2 ?? (the allowed regions in parameter space). 

The standard regions of the parameters allowed at 100/5% CL are those that satisfy the 
condition 

X 2 = X 2 min + AX 2 (f3), (5) 

where AX 2 (/3) is given by the value of \ 2 such that the cumulative \ 2 distribution for 
iVpar — 2 degrees of freedom (the number of parameters) is equal to f3. Common values for (3 
are 0.90 (1.64a), 0.95 (1.96a), 0.99 (2.58a), 0.9973 (3.00a), which give AX 2 (0.90) = 4.61, 
AX 2 (0.95) = 5.99, AX 2 (0.99) = 9.21, AX 2 (0.9973) = 11.83. 

This procedure would be correct if the theoretical rates depended linearly on the 

parameters Am 2 and tan 2 to be determined in the fit and the errors R^ 1 ^ — i?j exp ' 1 were 
multinormally distributed with constant covariance matrix V. Indeed, if these requirements 
were realized one could prove that X 2 has a x 2 distribution with N exp = 3 degrees of freedom, 
X^ in has a \ 2 distribution with N exp — N pSuT = 1 degrees of freedom, and X 2 — X^ in has a 
X 2 distribution with N pauT = 2 degrees of freedom (see |]34]- p^1 ). In this case the X 2 function 
would depend quadratically on the parameters and there would be only one allowed region 
with ellipsoidal form in the space of the parameters Am 2 and tan 2 
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In the case of solar neutrino data the gaussian distribution of experimental and theoretical 
uncertainties seems to be widely accepted, although it is not clear if this assumption is 
appropriate for the theoretical errors. On the other hand it is clear that 

1. The theoretical rates i?j thr ' ) do not depend at all linearly on the parameters Am 2 , 
tan 2 This is the reason why there are several allowed regions in the tan 2 d-Am 2 
plane (or the more traditional sin 2 2i?-Am 2 plane) and these regions do not have elliptic 
form (see [§^-0 ) • 



The covariance matrix V is not constant, but depends on Am 2 and tan 2 as remarked 
after Eq. (§. 

The errors 

^(thr) _ ^(exp) 

are not multinormally distributed, because although the 

fluxes 0p M and the cross sections are assumed to be multinormally distributed, 

their products (|3|), that determine the theoretical rates through the relations 

*f r) = E4 hr) > ( 6 ) 



are not multinormally distributed (see ||38|| ) 



Hence, the usual method of calculating the goodness of fit and the allowed regions in the 
tan 2 i?- Am 2 plane is not guaranteed to give correct results, i.e. the goodness of fit could 
be significantly different from 100a% and the confidence level of the regions enclosed by 
borders with constant X 2 = X^ in + AX 2 (j3) could be significantly different from 100/3%. 

We believe that the largest correction is due to the non-linear dependence of the theoret- 
ical rates i?j thr ' ) from the parameters Am 2 , tan 2 that causes the existence of more than one 
local minima of the least-squares function X 2 . This implies that there are more possibilities 
to obtain good fits of the data and the true goodness of fit is likely to be smaller than 100a%. 
Also, in repeated experiments the global minimum has significant chances to occur far from 
the true (unknown) value of the parameters Am 2 , tan 2 $, with a smaller probability that 
the allowed regions cover the true value with respect to the linear case. Hence, we expect 
that the true confidence level of a usual 100(3% CL allowed region is smaller than f3. 

In the following sections of this paper we perform a least-squares fit of the solar neutrino 
data using the X^ in estimator for the neutrino oscillation parameters Am 2 , t&n 2 $. We 
assume the usual gaussian distribution for the experimental and theoretical uncertainties. 
In Section |TTT| we calculate the goodness of fit using the Monte Carlo method, that is appli- 
cable in any case in which the distribution of the uncertainties is known (see, for example, 



Section 15.6 of |36|]). In Section [TV] we calculate with the Monte Carlo method the confi- 
dence level of the usual allowed regions in the tan 2 d-Am 2 parameter space. In Section [V] 
we implement the classical frequentist Neyman method for finding exact confidence regions 
with correct coverage at a given confidence level. 



III. GOODNESS OF FIT 

In order to calculate the goodness of fit, our method proceeds as follows (see, for example, 
Section 15.6 of [§§])• We estimate the best-fit values of Am 2 , tan 2 $ through the minimum 
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of X 2 in Eq. (JI|) and we call these best-fit values Am 2 , tan 2 $. Then we assume that Am 2 , 

tan 2 $ are reasonable surrogates of the true values Am 2 rue , tan 2 -#true and the probability 

distribution of the differences Am 2 ^) — Am 2 , tan 2 — tan 2 $ is not too different from the 

true distribution of the differences Am^) — Am 2 rue , tan 2 — tan 2 i9 true in a large set of 

best-fit parameters Am 2 ^, tan 2 i?^ (A; = 1, 2, . . .) obtained with hypothetical experiments. 

Using Am 2 , tan 2 $ as surrogates of the true values, we generate N s synthetic random data 
sets with the usual gaussian distribution for the experimental and theoretical uncertainties. 
We apply the least-squares method to each synthetic data set, leading to an ensemble of 

simulated best-fit parameters Am 2 ^, tan 2 $( s ) with s = 1, . . . , N s , each one with his associ- 
ated (^min)s- Then we calculate the goodness of the fit as the fraction of simulated (X^ in ) s 
in the ensemble that are larger than the one actually observed, X^ in . 

We calculate the synthetic data sets generating random neutrino fluxes fa with a multi- 
normal distribution centered on the SSM fluxes <ftf SM and having the covariance matrix 

V^l = 0rV| SM £ a ilk a i2k (AlnX fc ) 2 . (7) 

k 

We also generate random energy-averaged cross sections CV,- with a multinormal distribu- 
tion centered on the theoretical energy-averaged cross sections corresponding to Am 2 , 

tan 2 d and having the completely correlated covariance matrix for each independent exper- 
iment j 

Vff i2 = C hj A In C hj C i2j A In C i2j . (8) 

Then, we calculate the rates Rj = Yli^i^ij- Finally, we generate random synthetic experi- 
mental rates with normal distribution centered on Rj and standard deviation equal to 
that of the actual experimental data (<7j). The synthetic experimental rates are inserted in 
the least-squares function (|Ij) in place of i?j exp ' ) in order to find the minimum (X^ in ) s and 

its associated best-fit parameters Am 2 ( s j, tan 2 $(,,). 

The results of our calculations are reported in Table |. The global minimum of the least- 
squares function (P, X^ in = 0.42, occurs in the SMA region^ for Am 2 = 5.1 x 10~ 6 eV 2 
and tan 2, # = 1.6 x 10~ 3 . The results reported in the "SMA" row of Table | have been 

obtained taking Am 2 = 5.1 x 10~ 6 eV 2 and tan 2 $ = 1.6 x 10~ 3 . We first restricted the 
allowed region of the mixing parameters around the SMA region (10~ 4 < tan 2 d < 3 x 10~ 2 
and 3 x 10 _7 eV 2 < Am 2 < 10 _4 eV 2 ) and obtained the local value of the goodness of fit, 
reported in the "local" column of Table |. This value is almost equal (even slightly larger) to 
the standard one obtained assuming a y 2 distribution with one degree of freedom, reported 
in the "standard GOF" column of Table |I[ Hence, we conclude that locally the usual method 
to evaluate the goodness of fit is reliable. 



4 Here we use the standard terminology for the allowed regions (see [^J5[): SMA for Am 2 ~ 
5xlO" 6 eV 2 , tan 2 tf ~ 10" 3 , LMA for Am 2 ~ 3xl(T 5 eV 2 , tan 2 •& ~ 0.3, LOW for Am 2 ~ 10~ 7 eV 2 , 
tan 2 ~ 0.5, VO for Am 2 < 10~ 8 eV 2 . 
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region 


Goodness of fit 


standard GOF 


local 


MSW 


global 


SMA 


Xl iQ = 0.42 

Am 2 = 5.1 x 10" 6 eV 2 

tan 2 = 1.6 x 10~ 3 


51.8% 


53.7% 


48.4% 


39.6% 


LMA 


X 2 miQ = 3.46 

Am 2 = 1.5 x 10~ 5 eV 2 

tan 2 = 0.30 


6.3% 


6.1% 






LOW 


X 2 min = 6.53 

Am 2 = 1.3 x 10" 7 eV 2 

tan 2 = 0.55 


1.1% 


1.9% 






VO 


X 2 miQ = 1.29 

Am 2 = 9.4 x 10~ n eV 2 

tan 2 = 0.38 


25.6% 


14.2% 







TABLE I. Goodness of fit of solar neutrino data calculated with more than one million syn- 
thetic data sets. The first two columns indicate in which region the surrogate of the true values of 
the neutrino oscillation parameters has been assumed to be, the corresponding values of X 2 lhl and 
the values of the surrogates. The third column indicates the goodness of fit calculated with the 
standard method, i.e. assuming a x 2 distribution with one degree of freedom. The fourth column 
reports the goodness of fit calculated locally, i.e. restricting the allowed values of the parameters 
around the region in which the assumed surrogates of the true values lie. The fifth column reports 
the goodness of fit calculated restricting the allowed values of the parameters to the MSW region 
(^|). The sixth column reports the goodness of fit calculated without any restriction on the allowed 
values of the parameters. 

However, when we extend the allowed region of the mixing parameters to all the MSW 
region 

10 -4 < tan 2 < 2 , 10~ 8 eV 2 < Am 2 < 10~ 3 eV 2 (MSW region) , (9) 

and when we add also the VO region 

0.1 < tan 2 < 1 , 10" 11 eV 2 < Am 2 < 10" 8 eV 2 (VO region) , (10) 

we obtain the values reported, respectively, in the "MSW" and "global" columns of Ta- 
ble Q, which are significantly smaller than the one obtained with the standard method. As 
remarked in Section [TI|, this is due to the non-linear dependence of the theoretical rates 
from the neutrino oscillation parameters, that implies that there are more possibilities to 
obtain good fits of the data with respect to the linear case. Therefore, we conclude that 
the standard method, although valid locally (when the allowed region of the parameters is 
restricted around the SMA region the linear assumption is approximately correct), is not 
valid in general and should not be trusted if there is more than one allowed region. 

In order to check the local validity of the standard method we have also assumed that 

Am 2 and tan 2 have the values corresponding to the local minima of X 2 in the LMA, LOW 
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and VO regions, restricting the allowed values of the parameters around the corresponding 
regions. The results are reported in the "LMA", "LOW" and "VO" rows of Table |I[ One 
can see that the standard method is locally acceptable for the LMA and LOW solutions, 
but it largely overestimates the goodness of fit in the case of the VO solution. This is due to 
the fact that the theoretical rates are highly non-linear functions of the neutrino oscillation 
parameters in the VO region (fLOj), resulting in several disjointed allowed regions. 

The "MSW" and "global" entries in the "LMA", "LOW" and "VO" rows of Table | 
are empty because it is meaningless to calculate the goodness of fit allowing values of the 
parameters in which the fit is better than the one in the assumed surrogate of the true values 
of the parameters. 

Summarizing the results of this section, we have shown that if there were only one allowed 
region in the space of the neutrino oscillation parameters, or if there are valid reasons to 
restrict the allowed region of the parameters around one of the SMA, LMA, LOW solutions, 
the standard method to calculate the goodness of fit is approximately reliable. On the 
other hand, if there are more than one allowed regions, the standard method to calculate 
the goodness of fit is not reliable and the goodness of fit must be calculated numerically, 
with Monte Carlo, as we have done. This happens if one considers the MSW region (|9]) of 
the neutrino oscillation parameters, which contains three allowed regions (SMA, LMA and 
LOW), or the VO region (]T0[) , that contains several allowed regions, or all the parameter 
space (MSW+VO). 



IV. CONFIDENCE LEVEL OF ALLOWED REGIONS 

In order to calculate the confidence level of the allowed regions it is necessary first to 
understand what is its meaning. The 100/5% CL allowed regions are defined by the property 
that they belong to a set of allowed regions obtained with hypothetical experiments and 
the regions belonging to this set cover (i.e. include) the true value of the parameters with 
probability (3. 

Given the usual "100/3% CL" allowed regions in the space of the neutrino oscillation 
parameters we can calculate their confidence level /?mc with a method similar to the one 

described in the previous section for the goodness of fit. We assume that Am 2 , tan 2 $ are 
reasonable surrogates of the true values Am 2 rue , tan 2 $ true and we generate a large number of 
synthetic data sets. We apply the standard procedure to each synthetic data set and obtain 
the corresponding "100/3% CL" allowed regions in the space of the neutrino oscillation 
parameters. Then we count the number of synthetic "100/3% CL" allowed regions that cover 

the assumed surrogate Am 2 , tan 2 d of the true values. The ratio of this number and the total 
number of synthetically generated data set gives the confidence level /3mc of the "100/3% 
CL" allowed regions. 

The results of our calculations are reported in Table [TT]. As we have done in the previous 
section for the goodness of fit, we calculated first the local confidence levels restricting the 
allowed values of the parameters around the region whose local minimum of X 2 gives the 
assumed surrogates of the true values ("local" column of Table |I|). Then we calculated the 
confidence levels restricting the allowed values of the parameters to the MSW region ([|) 
assuming the surrogates of the true values in the local minima of X 2 of the SMA, LMA 
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region 


Confidence level of allowed regions 


standard CL 


local 


MSW 


global 


OA /T A 


90.00% (1.64 a) 


90.11% (1.65 a) 


87.22% (1.52 a) 


86.44% (1.49 a) 


95.00% (1.96 a) 


95.01% (1.96 a) 


93.08% (1.82 a) 


92.75% (1.80 a) 


99.00% (2.58 a) 


99.00% (2.58 a) 


98.51% (2.43 a) 


98.42% (2.41 a) 


99.73% (3.00 a) 


99.72% (2.99 a) 


99.58% (2.86 a) 


99.56% (2.85 a) 


T A l\ A 

LMA 


90.00% (1.64 a) 


89.86% (1.64 a) 


85.90% (1.47 a) 


82.31% (1.35 a) 


95.00% (1.96 a) 


94.93% (1.95 a) 


92.35% (1.77 a) 


90.57% (1.67 a) 


99.00% (2.58 a) 


98.99% (2.57 a) 


98.28% (2.38 a) 


98.00% (2.33 a) 


99.73% (3.00 a) 


99.73% (3.00 a) 


99.52% (2.82 a) 


99.45% (2.78 a) 


T r\\AT 

LUW 


90.00% (1.64 a) 


92.53% (1.78 a) 


86.59% (1.50 a) 


83.70% (1.40 a) 


95.00% (1.96 a) 


96.39% (2.10 a) 


92.81% (1.80 a) 


91.32% (1.71a) 


99.00% (2.58 a) 


99.33% (2.71 a) 


98.34% (2.40 a) 


97.98% (2.32 a) 


99.73% (3.00 a) 


99.82% (3.12 a) 


99.51% (2.81a) 


99.42% (2.76 a) 


vo 


90.00% (1.64 a) 


86.29% (1.49 a) 




81.82% (1.34 a) 


95.00% (1.96 a) 


92.99% (1.81a) 




90.42% (1.67 a) 


99.00% (2.58 a) 


98.68% (2.48 a) 




98.07% (2.34 a) 


99.73% (3.00 a) 


99.69% (2.96 a) 




99.50% (2.81a) 



TABLE II. Confidence level of the usual 90%, 95%, 99% and 99.73% CL allowed regions. The 
confidence levels have been calculated generating more than one million synthetic data sets. The 
first column indicates in which region the surrogate of the true values of the neutrino oscillation 
parameters has been assumed to be. The second column indicates the usual CL. The third column 
reports the confidence levels calculated locally, i.e. restricting the allowed values of the parameters 
around the region in which the assumed surrogates of the true values lie. The fourth column reports 
the confidence levels calculated restricting the allowed values of the parameters to the MSW region 
(H). The fifth column reports the confidence levels calculated without any restriction on the allowed 
values of the parameters. 

and LOW regions ("MSW" column of Table [Tl|). Finally, we calculated the confidence levels 
without any restriction on the allowed values of the parameters, assuming the surrogates of 
the true values in the local minima of X 2 of the SMA, LMA, LOW and VO regions ("global" 
column of Table ||) . 

One can see that the values of the confidence levels calculated locally for the SMA region, 
where the global minimum of X 2 lies, practically coincide with the standard ones ( "standard 
CL" column of Table |J). However, when the allowed values of the parameters are extended 
to the whole MSW region @ or to the MSW and VO regions (global), the confidence levels 
are significantly smaller than the standard ones. 

The same trend, slightly more pronounced, is observed when the surrogates of the true 
values of the parameters are assumed to correspond to the local minima of X 2 in the LMA 
and LOW region, with even a small deviation of the local confidence levels from the standard 
ones (with unpredictable sign). 

When the surrogates of the true values of the parameters are assumed to correspond 
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to the local minimum of X 2 in the VO region (|10D, the confidence levels are significantly 
smaller than the standard ones, even those calculated locally. This is due to the fact that 
the linear approximation used in the calculation of the standard confidence levels is badly 
violated (there are several disjointed allowed VO regions with non-elliptical shapes). 

Summarizing the results of this section, we have shown that the standard confidence 
levels of the allowed regions in the neutrino oscillation parameter space are approximately 
correct if only one of the SMA, LMA or LOW region is considered to be allowed a priori. If 
the oscillation parameters are restricted to the MSW region the confidence levels are sig- 
nificantly smaller than the standard ones, with some uncertainty depending on the assumed 
surrogates of the true values of the parameters. If one does not impose any restriction on 
the values of the parameters, the confidence levels decrease further. If only the VO regions 
are considered to be allowed even the confidence levels calculated locally are significantly 
smaller than the standard ones. 



V. EXACT ALLOWED REGIONS 

In the previous section we have calculated the confidence level of the allowed regions 
in the neutrino oscillation parameter space obtained with the standard procedure based on 
Eq. (||). This calculation is approximate, because it is based on the assumption of a surrogate 
for the unknown true values of the neutrino oscillation parameters. Furthermore, we have 
seen that the value of the confidence level is different if the surrogate for the unknown true 
values of the neutrino oscillation parameters is assumed to be the value of the parameters 
in the global minimum of X 2 or in one of the local minima. 

Luckily, there is a well-known procedure for constructing exact confidence intervals in- 
dependently of the true values of the parameters. This procedure has been invented by 
Neyman in 1937 |39| (see also |40|j3^j41|| ) . It guarantees that the resulting confidence inter- 



vals have correct frequentist coverage (see pZ?[- f46|l ), i.e. they belong to a set of confidence 



intervals obtained with different or similar, real or hypothetical experiments that cover the 
true values of the parameters with the desired probability given by the chosen confidence 
level. In this section we apply this method in order to find confidence intervals with proper 
coverage for the neutrino oscillation parameters. 

Neyman's construction of exact frequentist confidence interval with 100/3% confidence 
level starts with the choice of an appropriate estimator of the parameters under investiga- 
tion. Then, for any possible value of the parameters one calculates an acceptance interval 
with probability j3, i.e. an interval of the estimator that contains 100/?% of the values of 
the estimator obtained in a large series of trials. Several methods are available for the con- 
struction of the acceptance intervals (see |40|j3^j4l|j43|j45|j46[| and references therein). If the 



probability distribution of the estimator is known, the acceptance intervals can be calculated 
analytically; if not, one can calculate the acceptance intervals with numerical Monte Carlo 
methods. In general the acceptance intervals can be composed by disjoint sub-intervals. In 
the case of n parameters the acceptance intervals are regions in the n-dimensional parameter 
space. 

Once the 100/3% acceptance interval for each possible value of the parameters is calcu- 
lated, the 100/3% confidence interval is simply composed by all the parameter values whose 
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acceptance interval covers the measured value of the estimator (i.e. the actual estimate of 
the parameters). If the acceptance intervals are composed by disjoint sub- intervals, also the 
confidence interval is composed by disjoint sub-intervals. As we will see in the following, 
this is what happens in the case of solar neutrino oscillations. 

Our implementation of Neyman's construction goes as follows. First we choose as esti- 
mator of neutrino oscillation parameters the values of the parameters in the minimum X^ in 
of the least-squares function ([I]). Since the probability distribution of the chosen estimator 
is not known, we calculate it numerically with a Monte Carlo. We define an appropriate 
grid in the 2-dimensional space of the neutrino oscillation parameters tan 2 Am 2 and for 
each value of the parameters on the grid we generate a large number of synthetic data sets. 
For each data set we find the value of the parameters corresponding to the minimum of X 2 . 
This procedure gives the distribution of X^ in for each value of the parameters on the grid. 
Unfortunately this is a rather lengthy task that requires several days of computer time in 
order to reach an acceptable accuracy, essentially because of the large number of points on 
a reasonably fine grid, about five thousand in the MSW region (|9|) and six thousand in the 
VO region ([10]). 

We define the 100/3% acceptance intervals in the simplest and most natural wayQ: for 
each value of the parameters we choose the shortest possible acceptance interval, i.e. that 
containing the values of the parameters on the grid with highest probability, whose sum is 
equal or larger than (3 (in general perfect equality is not reached because of the discrete 
nature of the grid). In the case of a linear least-squares fit this method gives the allowed 
regions obtained with the standard prescription ([5]). Therefore, our exact allowed regions 
can be compared directly with the standard ones. 

The acceptance intervals are 2-dimensional regions in the tan 2 t?-Am 2 parameter space. 
Because of the non-linearity of the neutrino oscillation probability as a function of the 
parameters, the acceptance intervals are composed by disjoint sub-intervals. This generates 
2-dimensional confidence intervals composed by disjoint sub-intervals, some of which far from 
the values of the parameters corresponding to the actual X^ in . The confidence intervals are 
composed by the values of the parameters whose acceptance interval includes the parameters 
corresponding to the actual X^ in (the measured value of the estimator). 

The procedure is illustrated qualitatively in Fig. p], where the cross corresponds to the 
actually measured X^ in , the union of the two vertically hatched regions is the acceptance 
interval associated with tan 2 ^, Am\, and the union of the three horizontally hatched 
regions is the acceptance interval associated with tan 2, #£, Am 2 B . Since the acceptance 
interval associated with tan 2 $a, Am 2 A includes the point corresponding to X^ in , the point 
tan 2 Am 2 A belongs to the confidence interval. On the other hand, the acceptance interval 
associated with tan 2 -##, Am 2 B does not include the point corresponding to X^ in and the 
point tan 2 -&A, Am 2 A is out of the confidence interval. 



5 There is a subtle problem in choosing the method that defines the acceptance intervals: the 
method must be chosen independently of the data and the result. This is what we have done. 
Otherwise, the property of coverage is lost (see [H,0il-i6|), and one can always choose a method 
"ad hoc" to obtain any desired result. 
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tan D tan 0. 

B A 



FIG. 1. Illustration of the acceptance intervals. The cross corresponds to the actual X^^. The 
two vertically hatched regions constitute the acceptance interval associated with tan 2 ??^, Am\. 
The three horizontally hatched regions constitute the acceptance interval associated with tan 2 $ b > 
Am 2 B . 

The results of our calculations are presented in Figs. where we have depicted the 
90%, 95%, 99% and 99.73% CL regions (gray areas) confronted with those obtained with 
the standard method based on Eq. (|5|) (areas enclosed by solid lines). 

In Fig. |2] we have restricted the possible values of the neutrino oscillation parameters in 
a region around the SMA solution, where A^ in lies. The acceptance interval for each point 
on the grid in the parameter space has been calculated generating about 6 x 10 5 synthetic 
data sets (different for different points on the grid, in order to avoid correlations). One 
can see that the standard allowed SMA region is an acceptable approximation of the exactf] 
confidence interval. This is due to the fact that locally the linear approximation is rather 
good, as we already found in the previous two sections. 

In Fig. |3| we have extended the possible values of the neutrino oscillation parameters to 
all the MSW region (Q). For this figure the number of synthetic data sets for each point on 
the grid is about 7 x 10 4 (less than in Fig. ^| because of the larger size of the grid, that slows 
down the calculation). The standard SMA region is still an acceptable approximation of the 
exact SMA region, but the exact LMA and LOW regions are dramatically larger than the 



6 Here the adjective "exact" refers to the method, that produces confidence intervals with exact 
coverage. Obviously our confidence intervals are approximations of the exact ones, that would be 
obtained with an infinitely dense grid in parameter space and an infinite set of synthetic random 
data sets. 
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standard ones, so large that they merge together, producing a huge allowed region around 
maximal mixing (tan 2 ?? = 1). This is true even at 90% CL. 

Figures |] and [5] show, respectively, the allowed MSW and VO regions when there is 
no restriction on the possible values of the neutrino oscillation parameters (the number of 
synthetic data sets for each point on the grid is now about 6.5 x 10 4 ). A gam, one can see 
that the standard SMA region is an acceptable approximation of the exact SMA region, but 
the exact LMA, LOW and VO regions are much larger than the standard ones. 

From the results of our calculations we conclude that the standard method to calculate 
allowed regions produces reliable results only locally, i.e. in the calculation of the allowed 
region surrounding the global minimum of X 2 . The other allowed regions are dramatically 
underestimated by the standard method. 



VI. CONCLUSIONS 

We have presented the results of a numerical Monte Carlo calculation of the goodness 
of fit and the confidence level of the standard allowed regions for the neutrino oscillation 
parameters Am 2 , tan 2, # obtained from the fit of solar neutrino data. We have shown that 
the standard values of the goodness of fit and of the confidence level of the allowed regions 
are significantly overestimated with the standard method. This is due to the non-linear 
dependence of the neutrino oscillation probability from the parameters. The linear approx- 
imation, leading to the standard values of the goodness of fit and of the confidence level of 
the allowed regions, is valid only locally, for values of the parameters around a specific MSW 
solution (SMA, LMA, LOW). In the case of the VO solution the linear approximation is 
not valid even locally, because of the strong non-linearity of the oscillation probability that 
causes the existence of several allowed regions close together. 

We have also calculated exact allowed regions with correct frequentist coverage using 
Neyman's method. The results of these calculations show that the standard allowed region 
around the global minimum of the least-squares function is a reasonable approximation of the 
exact one. On the other hand, the size of the other regions is dramatically underestimated 
in the standard method. Indeed, in our calculation the exact SMA region, that contains 
the minimum of the least-squares function, practically coincides with the standard one. On 
the other hand, the exact LMA and LOW regions are much larger than the standard ones, 
so much that they merge in a huge allowed region around maximal mixing. Also the exact 
allowed VO regions are much larger than the standard ones. 

The indications on neutrino mixing coming from solar neutrino data are becoming in- 
creasingly important for theory and experiment. Furthermore, solar neutrino data will soon 
be enriched by results of new powerful experiments (SNO |^7|, Borexino [48|, GNO [49 



and others [pOfl). As we have shown, the standard statistical analysis of solar neutrino data 
can lead to incorrect conclusions concerning the goodness of fit, the confidence level of the 
allowed regions and the size of the allowed regions far from the global minimum of the 
least-squares function. Hence, we believe that it is time to examine critically the method of 
statistical analysis of solar neutrino data and bring it to the level of quality already attained 
in other branches of research in high-energy physics. 
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FIG. 2. Allowed 90%, 95%, 99%, 99.73% confidence level regions in the tan 2 tf-Am 2 plane. In 
each plot the gray area is the allowed region with exact frequentist coverage obtained restricting 
the possible values of tan 2 •& and Am 2 in a region around the SMA solution, where X^ in nes - The 
area enclosed by the solid line is the standard SMA allowed region. 
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FIG. 3. Allowed 90%, 95%, 99%, 99.73% confidence level regions in the tan 2 tf-Am 2 plane. 
The gray areas are the allowed regions with exact frequentist coverage obtained considering all 
possible values of tan 2 $ and Am 2 in the MSW region (the whole area of the plots). The areas 
enclosed by the solid lines are the standard SMA, LMA and LOW allowed region. 
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FIG. 4. Allowed 90%, 95%, 99%, 99.73% confidence level regions in the tan 2 tf-Am 2 plane. 
The gray areas are the allowed regions with exact frequentist coverage in the MSW region obtained 
without any restriction on the possible values of tan 2 # and Am 2 . The areas enclosed by the solid 
lines are the standard SMA, LMA and LOW allowed region. 
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FIG. 5. Allowed 90%, 95%, 99%, 99.73% confidence level regions in the tan 2 tf-Am 2 plane. 
The gray areas are the allowed regions with exact frequentist coverage in the VO region obtained 
without any restriction on the possible values of tan 2 # and Am 2 . The areas enclosed by the solid 
lines are the standard VO allowed regions. 
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